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Meteorology and weather forecasting are crucial for predicting future 
climate conditions. Forecasts can be helpful when they provide information 
that can assist people in making better decisions. People today use big data 
to analyze social media information accurately, including those who rely on 
the weather forecast. Recent years have seen the widespread use of machine 
learning and deep learning for managing messages on social media sites like 
Twitter. In this study, authors analyzed weather-related text in Indonesia 
based on the searches made on Twitter. A total of three machine learning 
algorithms were examined: support vector machine (SVM), multinomial 
logistic regression (MLR), and multinomial Naive Bayes (MNB), as well as 
the pretrained bidirectional encoder representations of transformers (BERT), 
which was fine-tuned over multiple layers to ensure effective classification. 
The accuracy of the BERT model, calculated using the Fl-score of 99%, was 
higher than that of any other machine learning method. Those results have 
been incorporated into a web-based weather information system. The 


Weather classification result was mapped using Esri Maps application programming 
interface (API) based on the geolocation of the data. 
This is an open access article under the CC BY-SA license. 
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1. INTRODUCTION 


Indonesia has a sea area of 6.22% of its relative area. The Indonesian territory is therefore 
characterized by a marine climate [1]. Global warming has led to the change of climate, especially during the 
dry and rainy seasons. The dry season lasts longer since it lasts longer, whereas the rainy season is shorter 
and occurs at a different time [2], [3]. The characteristics of multiple physical mechanisms and the dynamic 
nature of rainfall make it difficult to determine its consistency [4]. Intergovernmental Panel on Climate 
Change (IPCC) points out that climate change will require adaptation to environmental, social, and economic 
factors. Climate often changes in Indonesia because of its tropical location. Government mandates providing 
real-time weather data to support community activities [5], [6]. 

The advancements in technology have already led to progress in disseminating information; most of 
the information that the community receives comes from social media [7]. The government distributes 
publications in a variety of ways to meet the information needs of the public. Furthermore, the public aims to 
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stay informed about what happens around them, especially in relation to relevant events [8]. Twitter is used 
by people worldwide to access different types of information, including all kinds of information on Twitter. 
Monitoring topics and events is made easier with a structured combination of search parameters on a Twitter 
channel. We implemented the geolocation by using available application programming interface (APIs) and 
web services. Using existing APIs, location-specific terms were detected in a tweet. Social media platforms 
are continually generating and delivering information in real-time from various sources to users. Topics, 
hashtags, geographic location, language are extracted from tweets. In addition to scraping followers, likes, 
and retweets, a Python package called Twitter intellegence tools (Twint) allows users to identify their 
followers. 

A variety of Twitter accounts, mostly related to information, have emerged all over the world in the 
last few years, most notably in Indonesia [9], [10]. The platform can be used to track public discussions about 
several issues that have been shared via Twitter. Data and information from Twitter have been used for 
classification tasks in a number of projects [11], [12]. Using the K-nearest neighbor (KNN) algorithm, a 
potential company's employees can be identified by their personalities. KNN identified the Myers-Briggs 
type indicator (MBTI) categories based on character classifications for potential employees from tweets [13]. 

Deep learning enhances the performance of various fields. Among the data types covered are images 
[14], [15], time series data [16], sounds [17], and text [18], [19]. Due to its time requirements and costs, 
bidirectional encoder representations from transformers (BERT) presents a challenge when used to classify 
large datasets, but it is generally still used because it is relatively inexpensive to train. Thus, the author used 
the BERT algorithm, which can only learn from datasets containing at least 256 characters [20]. In this study, 
we investigated whether sentiment analysis in texts can be classified using BERT-base. Using the Pontiki 
dataset, known as the laptop dataset [21], BERT, developed by Alexander Rietzler and fine-tuned with 
several layers, has been successful in detecting sentiment. A machine learning classification method was 
developed by the author using the support vector machine (SVM) technique before the BERT method, which 
provided 93% accuracy. As a result, other machine learning algorithms, such as multinomial Naive Bayes 
(MNB) and multinomial logistic regression (MLR), did not achieve highly accurate predictions when applied 
to Twitter data about weather conditions [22]. 

Data collected by the Meteorology, Climatology, and Geophysics Agency (BMKG) can be obtained 
from a number of sources. In Figure 1, it can be seen that the BMKG collects data and integrates them with 
each other to provide information on meteorology, climatology, and geophysics. The integration capabilities 
of BMKG can be enhanced by implementing a big data system that integrates multiple data sources. The first 
step to gathering weather data is to use automatic surface air instruments like automatic weather stations 
(AWS) and automatic rain gauges (ARG). An ARG is an instrument that measures rainfall. The two methods 
of recording data using this tool are manually (non-recording) and automatically (self-recording). In addition 
to rainfall information, weather forecasts also require data such as temperature, wind speed, and air humidity. 
Data can be obtained from AWS. 


Automatic 


Station 


Figure 1. Global observing system on meteorology, climatology, and geophysics 
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The national weather service (NWS) forecasts and issues warnings for weather and hydrologic 
conditions in the United States, its territories, and adjacent waters and oceans, for the purpose of protecting 
lives and property and enhancing the nation's economy. As a complementary service, the NWS delivers 
Twitter feeds as a means of enhancing the reach of its information. In addition to disseminating 
environmental information, NWS will engage in outreach and education to increase awareness of weather 
conditions. 

In this paper, we propose a machine learning method to integrate real-time weather data about 
Indonesia to support data diversity. Data from Twitter is used as the basis for this machine learning process. 
According to the Twitter location data, the crawled data is geolocated and entered into the database. The 
paper is organized firstly how weather information was collected using Twitter, followerd by a description of 
the methodology used to analyze the data and summarizes the results and discusses them in detail. 


2. METHOD 

A Twitter framework for providing weather information can be seen in Figure 2. Figure 2 illustrates 
how data is stored in a database and reported in real-time to netizens through the android application. In the 
text preprocessing phase, uniform resource locators (URLs) are removed and unused words are eliminated, 
including stop words in the Indonesian language. Special characters are also removed. Authors determine the 
class based on the label generated during the training process and weather consultant in the classification 
phase. Geolocation filling is done based on the name of the district or city aforementioned inside a tweet. 


Sensor Weather Station 


Weather Consultant 
~ 


Geolocation Filling 


o 


Weather Database 


Real-time forecast result 


Figure 2. Integrated source data for the weather information system 


2.1. Dataset 

GetOldTweets3 was used to crawl tweets for the dataset. It is a Python 3 library for retrieving old 
tweets. According to Table 1, tweets were crawled from January to May 2019 that contained keywords 
derived from Indonesian (which were already translated into English). An Indonesian tweet is marked by a 
code of the language 'id'. A total of 506 tweets have been labeled. This experiment divided 20% of the total 
dataset into testing and control groups according to the Pareto ratio. We obtained 404 images for training and 
102 images for testing from this process. 
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Table 1. Keywords for each class 


Class Keywords 
Cloudy Thick clouds, cloudy clouds, dark clouds 
Sunny The body gets wet of sweat, bright light, ill, clear, hot 
Rainy Rainy, rain, rainfall 
Heavy rain Lightning, thunderstorm, thunder, soaking wet 
Light rain Light rain, spatter, drizzle, spatter 


In all, five classes of data were analyzed, namely "light rain", "heavy rain", "rainy", "sunny" and 
"cloudy". Figure 3 summarizes how these labels were distributed. Due to the similarity of the keywords for 
"rainy", "heavy rain" and "light rain", the amount of data in these classes is lower than the amount of data in 


the "sunny" class. 


Total 


Cloudy 
Sunny 
Heavy 

Rain 
Light 
Rain 
Rainy 


Class 


Figure 3. Data distribution 


2.2. Pre-processing 
Lowercase is applied to tweets. As shown in Figure 4, the following are removed from content: 


excessive newline characters and whitespace, URLs, Twitter and Instagram formatting, and non-American 
Standard Code for Information Interchange (ASCID letters. Tweets containing emojis are translated using 
116 emoji symbols formed in a .txt file, while tweets that contain slang words are transformed by 2,879 slang 
words written in a text file. Tokens are added to the beginning of each BERT text for classification [CLS]. 


ae "The Indonesian Regional Weather Early Warning is 
Original valid until tomorrow. Be aware of the potential for heavy 
Text rain accompanied by strong winds and lightning which 


can result in landslides, floods. . . Good morning and 
happy activities... :) 


the indonesian regional weather early warning is valid 


until tomorrow. be aware of the potential for heavy rain 
Preprocessed accompanied by strong winds and lightning which can 
Text result in landslides floods. good morning and happy 


activities 


Figure 4. Comparison of tweet before and after text preprocessing 


2.3. Feature extraction for machine learning algorithms 

TfidfVectorizer is a machine learning algorithm that is based on term frequency—inverse document 
frequency (TF-IDF) and specifically processes words in a document [23]. By using this method, the inverse 
document frequency of a word (term) can be tracked [24]. The TF is calculated by counting how many words 
are in the word. The IDF method answers this question by determining which side of a document has more 
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weight. In other words, TF and IDF play a preliminary round match and determine the winner. To calculate 
the weight (W) of each document against keywords, the IDF TF algorithm uses the (1), 


War = T fat * 1dfat (1) 


Wart = the weight of document d against word t. 
T fat = the frequency of occurrence of term iin document j divided by the total terms in document j, 
explained in the (2), 


fali) 
Thi = 2 
Tat fai) (2) 


Idf4, is the function to reduce the weight of a term if its appearance is scattered throughout the document as 
spelled out in (3). 


N 
df -+1 


Idfar = log ( ) (3) 


df, = |{d € D: t € d}| is the number of documents containing term t and N is the total number of 
documents in the corpus, N = |D|. Adding | to avoid dividing by 0 if df; is not present in the corpus [25]. 


2.4. Classification method based on machine learning approaches 

SVMs are supervised learning classification methods. In the SVM method, the original training data is 
mapped into a higher dimension using nonlinear mappings. The goal of this technique is to find the best separator 
function (hyperplane) to separate pairs of objects among all possible functions. In general, the best hyperplane 
can be defined as a line connecting two classes of objects. Using an SVM, the best equivalent hyperplane is 
constructed by maximizing the margins or distances between two different sets of classes [26], [27]. 

Naive Bayes with multinomial structures is a development of the Naive Bayes method which uses 
Naive Bayes to determine a probability value as to how often a word appears in a sentence. It affects the 
probability value according to the frequency with which that word appears in a sentence. However, there is a 
problem if a word is not included in any class in the Naive Bayes multinomial method. Probabilities 0 or zero 
are affected by this [28]. 

The scikit-learn Python package provides the Laplace smoothing method that avoids zero 
probabilities. As long as the @ value is greater than 0, this method works by adding the @ value. This value is 
set to 1. 


P(G) = count(wj,cj)+a (4) 


count(c;)+|V| 


where P(c;) is the probability value of word i against class j, count (wi, cj) is the value of occurrence of 
word i in class j, and o. is the value of Laplace smoothing (default as a = 1). Then, count (c;) is the number 
of members of class j and |V| is the number of members of the entire class without doubling. 

In machine learning, MLR, also called softmax regression, is a method of separating classes of 
feature vectors from several classes. This method generalizes the logistic regression classification scheme for 
solving multiclass problems [29]. The main difference between the methods is the activation function. In 
MLR, sigmoid activation functions are used, while softmax activation functions are used in logistic 
regression. The scikit-learn logistic regression package in Python can be set up to use MLR by selecting 
multi-class as "multinomial". 


2.5. BERT as a deep learning method 

BERT is a two-way method based on a transformer architecture, replacing long short-term memory 
(LSTM) and gated recurrent units (GRU) in a sequential way with an attention approach that is faster. 
Additionally, the method was pre-trained to perform two unsupervised tasks, including modeling the masked 
language and predicting the next sentence. The pre-trained BERT method is utilized to perform downstream 
tasks like sentiment classification, intent detection, and question answering [30]. 

Documents may be classified according to multiple labels or classes simultaneously and 
independently, as indicated by multi-label classification. The multi-label classification has numerous real- 
world applications, such as categorizing businesses or assigning multiple genres to a film [31]. It can be used 
in customer service to determine multiple intentions for a customer email [32]. 
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BERT-Base has a vocabulary of 30,522 words. Tokenization consists of splitting input text into 
tokens within a vocabulary. WordPiece tokenization is used by BERT for words that are not in its 
vocabulary. The outside words are gradually subdivided into sub-words and then represented by groups of 
sub-words [33]. 


2.6. Fine-tuning BERT 

BERT is a network architecture that has been trained using large datasets from a wide variety of 
articles in multiple languages. Consequently, rather than train the BERT layer, which already has very good 
weights, researchers need to fine-tune the BERT layer for text classification [34]. Figure 5 depicts the input 
layer of the BERT method used to feed pre-processed tweets, followed by one dense layer employing tanh 
activation function, two dropout layers 0.5, and one output layer employing softmax activation function and 
cross-entropy loss. BERT pre-trains are fine-tuned by adding two dropouts (0.5), one dense layer, and one 
output layer. It is intended to stop overfitting by adding two dropout layers. Overfitting is when a model is 
too successful as a result of the training process, but it has the disadvantage that it is too dependent on 
training data, so the results are incorrect when new data is provided for classification [35]. In this model, 10 
epochs were used with batch sizes of 5 and a sequence length based on the length of the dictionary from a 
tweet, which is the maximum for the previously trained model. AdamW optimizer was used with a learning 
rate of 3 e-5. 


| Token 1 | oe | Token N | | Token 1 t | Token M | Token 1 | vee | Token X | 
good 


the tomorrow be floods activities 
L__,__J ——— —__,—_ 
Sentence 1 Sentence 2 Sentence 3 


Figure 5. BERT fine-tuning model 


2.7. Evaluation metrics 
Confusion matrices are commonly used for calculating accuracy. The confusion matrix provides 
information on the comparison between the results generated by the model (system) and those actually 
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generated [36]. As shown in Table 2, there are 4 terms representing the classification results. TP, TN, FP, and 
FN represent true positive, true negative, false positive, and false negative, respectively. A test is conducted 
on Fl-score, recall, and precision values in order to determine the accuracy of the results. Models are 
evaluated based on their Fl-score, as they perform well on imbalanced datasets [37]. In (5) and (6), Fl-score 
can be calculated for each class that offers the same weighting for recall and precision. 


fare = (2*(Precision*Recall) ) «100% (5) 


(Precision+Recall) 
There is a weighted Fl-score in which recall and precision can be assigned different weightings. 


Fp = (1+B?)*(Precision*Recall) 
B ~ "((B2«Precision)+Recall)) 


* 100% (6) 


B reflects how much recall is more important than precision. The value of B is 2 if the recall is twice as 
significant as precision [38]. 


Table 2. Confusion matrix 
Actual class 


Relevant Non-Relevant 
Predicted class Retrieved Correct result Unexpected result 
True positive (TP) False positive (FP) 
Not retrieved Missing result Correct absence of result 
False negative (FN) True negative (TN) 


The precision (7) indicates the system's ability to find the most relevant documents and is defined as 
the percentage of documents located and relevant to the query. A recall (8) measures the ability of the system 
to locate all relevant items from a document collection and is defined as the percentage of documents 
relevant to a query. The accuracy of the (9) is a comparison between correctly identified cases and the 
number of identified cases, compared to the error rate (10) on incorrectly identified cases. 

TP = The number of correct predictions from relevant data. 
FP = The number of incorrect predictions from irrelevant data. 
FN = The number of incorrect predictions from irrelevant data. 
TN = The number of correct predictions from relevant data. 


Precision = —-— « 100% (7) 
(TP+FP) 
Recall = an + 100% (8) 
= (TP+TN) 5, 
Accuracy = ae ppatwaFny * 100% (9) 
Error Rate = —“**™ _ + 100% (10) 
(TP+FP+TN+FN) 


2.8. Database management and geolocation filling 

Purwandari et al., developed a database management system that manages weather data for 
Indonesia. Three users are involved in this system: netizens, forecasters, and data engineers. A source of data 
from netizens is collected using tweets from Twitter, forecasters rely on data from BMKG sensors throughout 
Indonesia, and all data is analyzed by data engineers before being reported to the public. A data dictionary, 
entity-relationship diagrams, and use cases have been used to visualize all completed data [39]. 

In spite of this, less than 1% of the crawled tweet posts include geolocation information. Therefore, 
it is very important to ensure accurate predictions of the tweet posts for non-geo-tagged tweets when 
analyzing data in different domains. Moreover, we can alter it by modifying the city district database by 
adding district/city aliases to reflect the crawled tweets. Using this method, tweets from remote areas of 
Indonesia can still be displayed with the longitude and latitude even if the global positioning system (GPS) is 
not turned on. The content of tweets and metadata information can be used to identify a user's location even 
if Twitter has access to this information. In such a case, third parties will have to use other sources to identify 
the geolocation of a user or tweet. 
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3. RESULTS AND DISCUSSION 
3.1. Evaluation of machine learning algorithms 

This study compared three machine learning methods. The results are shown in Table 3. According 
to this table, SVM can successfully classify Twitter texts about the weather with a recall value of 87.3% and 
a precision value of 90.6%. The MLR method yields 83.3% recall and 90.3% precision. With the MNB 
method, the recall value is 73.6% and the precision is 86.3%. SVM provided the most accurate results, 
followed by MLR, then MNB, and also displayed the lowest error rate in comparison. Especially on Twitter 
about weather documents, SVM has proven to be effective in text classification. This is evident from the 
results of the test on the weather text classification, which showed that the recall value was lower than the 
precision value. Therefore, the precision level in this text classification was found to be effective. SVM 
became popular due to its accuracy and recall; this was confirmed during the test of the method. 


Table 3. Classifier evaluation using machine learning approaches (%) 


Model Precision _ Recall Fl-score Accuracy Error rate 
SVM 90.6 87.3 88.1 87.3 12.7 
MLR 90.3 83.3 85.6 83.3 16.7 
MNB 86.3 73.6 71.5 73.5, 26.5 


An understanding of machine learning models requires a confusion matrix. The columns of the 
confusion matrix represent instances of the prediction class, whereas the rows represent instances of the 
actual class. The confusion matrix results illustrated in Figures 6(a) to 6(c) support the aforementioned 
results. Figure 6(a) illustrates the confusion matrix results from using SVM. Based on Figure 6(b), the 
method of MLR is also quite efficient for classifying weather-related tweets on Twitter. Figure 6(c) shows 
that the results of the MNB method are poor for the "light rain" class, since no TPs are generated in this class 
as indicated by the confusion matrix. 


Cloudy 2 2 2 0 
Sunny 0 2 (0) 6 
ad 
2 
S Rainy 0 0 0 0 
2 
a 
Heavy Rain oO 0 1 6 1 6 1 
Light Rain 0 0 O 0 3 Light Rain 0 0 0 0 1 
Cloudy Sunny Rainy Heavy Rain Light Rain Cloudy Sunny Rainy Heavy Rain Light Rain 
Actual Actual 
(a) (b) 
Cloudy 10 0 0 1 0 
Sunny 10 4 2 5 
3 
> Rainy 
3 3 0 5 2 
® 
& 
Heavy Rain 0 0 0 1 0 
Light Rain 0 0 0 0 0 
Cloudy Sunny Rainy Heavy Rain Light Rain 


Actual 


(c) 


Figure 6. The confusion matrix results illustrated in (a) SVM, (b) MLR, and (c) MNB 


Int J Artif Intell, Vol. 12, No. 1, March 2023: 271-283 


Int J Artif Intell ISSN: 2252-8938 o 279 


3.2. Evaluation of BERT method 

BERT method confusion matrix is depicted in Figure 7. It was concluded that cloudy, sunny, and 
light rain classes are able to perform classification well, meaning that the three classes have the exact same 
number of TP results as the actual number of sentences. There was one data point predicted as ‘heavy rain’ 
(FN). As can be seen, there is 1 prediction in the "rainy" class FP for the "heavy rain" class. Correct 
predictions are located in the diagonal figures, so visually it is obvious that unexpected predictions lie outside 
the diagonal confusion matrix. As shown in Table 4, the results of precision, recall, Fl-score, accuracy, and 
error rate using the BERT method are 99.1%, 99%, 99%, 99%, and 1% respectively. 


Cloudy 0 0 
Sunny 0 0 
UD 
2 
3 Rainy 0 0 
a 
Heavy Rain 1) 0 1 8 0 
Light Rain 0 0 0 0 8 


Cloudy Sunny Heavy Rain Light Rain 


Rainy 


Actual 
Figure 7. Confusion matrix of BERT model 
Table 4. Classifier evaluation using BERT method (%) 


Model _ Precision Recall Fl-score Accuracy Error rate 
BERT 99.1 99 99 99 1 


Table 5 provides precision, recall, and Fl-score from each class demonstrating the results. BERT 
model generated a maximum F1-Score for "cloudy", "sunny", and "light rain" classes, and it was worked on 
as well for "rainy" and "heavy rain" classes. A model's output results can be ensured by using experimental 
results that have been used to analyze training data loss and validation. In Figure 8, it can be seen that loss 
from training data is often constant, increasing from epoch 4 to epoch 5, whereas loss from data validation is 
more unstable, increasing from epoch 4 to epoch 5. Validation losses tend to produce unreliable results due to 
the random input data each epoch receives. It can be said that the BERT model is very robust and stable. 
Unfortunately, due to the imbalanced distribution of data in Figure 3, overfitting occurred as a result of 


training data. 


Table 5. Evaluation metrics for individual classes using BERT model (%) 


Class Precision Recall Fl-score 
Cloudy 100 100 100 
Sunny 100 100 100 
Rainy 100 96.2 98.1 
Heavy rain 89 100 94.2 
Light rain 100 100 100 


Additionally, Figure 9 displays the Fl-Scores for each epoch along with Figure 8. In spite of the 
decrease in yields from epoch 8 to epoch 9, it can be shown that the yield increases with each succeeding 
epoch. Every epoch contains five batches, and each batch must complete its task before the weight is 
changed. Weights are updated based on the estimated sum of losses. Using the convolutional output with 
BERT layers, the loss function is computed. Weights with the best quality will be saved for testing after the 
epoch has ended. 


3.3. Web-based weather report 
Following the BERT model, the next step would be to fill out the empty geolocations in the tweet. 
Geolocations are determined based on latitude and longitude coordinates of the cities and districts from 
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Central Agency on Statistics (BPS) which has been integrated into the database. This is a necessary step 
before integrating weather information on a website. Once the geolocation point has been filled, the latitude 
and longitude points are plotted into Esri Maps. Example of plotting weather reports submitted by netizens 
into Esri Maps, shown in Figure 10. As can be seen in the report, the first geolocation shows the word "South 
Jakarta". Consequently, the tweet is positioned at coordinates 6.2615° S, 106.8106° E, which means South 
Jakarta coordinates. 


Model Loss Validation F1 Score (Weighted) 


1.6 1 
4 PBraining @Valdiatior —— 
0.8 
2 


$ 
38 3 
o 
g 6 704 9 
' ii . 
o2 ff 
2 
0 n) 
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 
Epoch Epoch 
Figure 8. The plot of model loss on training and Figure 9. The plot of Fl-score result on validation 
validation datasets datasets 


Netizen Weather Report 


ganten Pa 
Pn 4 User :@republikaonline 
ae uy ON) Weather : Heavy Rain 

Tweets : During the day, thunderstorms are 


forecast in South Jakarta and East Jakarta. 
httos://t.co/ftRtLSZWwz 


Via Twitter 2 hours ago 


if 
‘ 4 AL 4 
~6.20828 : 106.84249 »- 


Figure 10. Example of tweets integrated into a geographic information system (GIS) with weather 
classification and geolocation plotting 


3.4. Discussion 

This study focused on the comparison of basic machine learning models (SVM, MLR, and MNB) 
and deep learning models (BERT) for classification texts. The best classification results aim to be applied to 
a website-based information system. By using a machine learning model, maximum results have been given, 
especially in the SVM model. Classification results are compared primarily to advanced models like the 
BERT transformer and classical natural language processing. In recent years, BERT has achieved state-of- 
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the-art results in a wide range of natural languange processing (NLP) tasks [40]. The application of BERT 
transfer learning using a text dataset on weather has proven to be able to provide good results. BERT is a big 
neural network architecture, with a huge number of parameters, that can range from 100 million to over 300 
million. So, training a BERT model from scratch on a small dataset would result in overfitting [41]. 

A result of loss validation and training shown in Figure 8 shows evidence of overfitting. Obviously, 
this can happen when the model used for training is too focused on one training dataset, and so it cannot 
predict correctly if given another similar dataset [40], [42]. Figure 3 shows the distribution of data for certain 
training datasets. Twitter's data regarding sunny weather has the highest number in the period between 
January and May 2019. The dry season begins in April and May. Therefore, March is the transitional period 
between the rainy and dry seasons. After that, cloudy weather and heavy rain almost equal each other. 
According to BMKG data, Indonesia enters its rainy season only in January or early February of 2019. 
Despite the similarity in words between heavy rain, light rain, and rain in the tweets, heavy rain, light rain, 
and rainy show a small comparison. Because of the ambiguity in the labels, it is difficult to determine which 
category the tweets belong to. For example: “There is a high probability that rain will drench the entire DKI 
Jakarta area today. We are expecting light rain to heavy rain in the morning”. 

In filling out the geolocation, the ambiguity of mentioning the name of the district/city in the 
sentence tweet also affects the plotting results on Esri Maps. The diversity of ethnic groups in Indonesia 
causes the use of regional languages to be used in everyday language. According to data from the BPS, in 
Indonesia there are 1,340 tribes or ethnic groups. Meanwhile, according to the language development and 
development agency, the number of regional languages in Indonesia was 646 at the beginning of 2017. The 
similarity between regional languages and regional names in a place affects geolocation filling. This causes 
the plotting on Esri Maps to not match the area names mentioned in the tweet. For example, the word "karo" 
can be translated as a regional language from the Central Java Region, and also there is name of district in 
North Sumatra called "Karo". In this case, the text will be plotted at coordinates 3.1053° N, 98.2651° E on 
Esri Maps which shows the “Karo” district location. 


4. CONCLUSION 

The use of Twitter has been proved an effective tool for opinion mining and polling, especially in 
predicting weather conditions. BERT-based pretrained model is effective for classifying texts from Twitter, 
based on the dataset used. Identifying data sets before modeling algorithms for different classifications or 
scenarios is imperative. In addition to categorizing short sentences, BERT-base is useful for other purposes. 
This model has a yield of 99%. In comparison to automatic classification algorithms (SVM, MNB, and 
MLR), this accuracy proves to be very good. Based on it, the sentences after the BERT model have been used 
for geolocation filling tasks from mentioning the name of the district/city in tweets. Tweets are mapped into 
Esri Maps according to the geolocation points. For future works, the authors will continue mining and 
analyzing more Twitter data using smart crawling to get a more accurate prediction about weather conditions 
in Indonesia. 
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